AITopics | zero-shot segmentation

Collaborating Authors

zero-shot segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation (Supplementary material) Anonymous Author(s) Affiliation Address email

Neural Information Processing SystemsFeb-14-2026, 01:47:49 GMT

In the supplementary material, we first introduce technical details of the "frozen CLIP" approaches in Sec. 1. Then the dataset settings are shown in Sec. 2. Figure 1 presents an overview of the "frozen CLIP" approach. It's worth noting that all sub-images are resized to Figure 2: Comparison among three merge operations. Pascal-VOC, COCO-Stuff and ADE20K, to evaluate the performance of MAFT. Pascal-VOC: There are 10582 images for training and 1,449 images for testing. ADE20K: ADE20K contains 25k images for training and 2k images for validation. Pascal-Context is an extensive dataset of Pascal-VOC 2010.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.45)

Add feedback

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

Neural Information Processing SystemsDec-26-2025, 00:17:13 GMT

Recently, pre-trained vision-language models have been increasingly used to tackle the challenging zero-shot segmentation task. Typical solutions follow the paradigm of first generating mask proposals and then adopting CLIP to classify them. To maintain the CLIP's zero-shot transferability, previous practices favour to freeze CLIP during training. However, in the paper, we reveal that CLIP is insensitive to different mask proposals and tends to produce similar predictions for various mask proposals of the same image.

learning mask-aware clip representation, mask proposal, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.54)

Add feedback

Consistent Structural Relation Learning for Zero-Shot Segmentation

Neural Information Processing SystemsDec-24-2025, 04:51:59 GMT

Zero-shot semantic segmentation aims to recognize the semantics of pixels from unseen categories with zero training samples. Previous practice [1] proposed to train the classifiers for unseen categories using the visual features generated from semantic word embeddings. However, the generator is merely learned on the seen categories while no constraint is applied to the unseen categories, leading to poor generalization ability. In this work, we propose a Consistent Structural Relation Learning (CSRL) approach to constrain the generating of unseen visual features by exploiting the structural relations between seen and unseen categories. We observe that different categories are usually with similar relations in either semantic word embedding space or visual feature space.

consistent structural relation learning, unseen category, visual feature, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Review for NeurIPS paper: Consistent Structural Relation Learning for Zero-Shot Segmentation

Neural Information Processing SystemsJan-25-2025, 20:19:27 GMT

Summary and Contributions: Post rebuttal update I originally gave this paper an '8' and I will keep my original rating. The method is a good improvement upon [1]: it extends [1] with a simple and reproducable idea. Experimentally they demonstrate good improvements over [1]. In contrast to R3, I think that is not only a decent amount of novelty, but also the simple kind of novelty that is likely to be adopted by other reviewers. The other two main weaknesses highlighted by several reviewers were: 1) A better positioning w.r.t.

category, consistent structural relation learning, zero-shot segmentation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.43)

Add feedback

Review for NeurIPS paper: Consistent Structural Relation Learning for Zero-Shot Segmentation

Neural Information Processing SystemsJan-25-2025, 20:19:20 GMT

Paper originally received a set of somewhat mixed reviews from four reviewers, with scores: 8, 5, 5, 6. Generally, the reviewers liked the work, commenting on how it addressed an important problem [R3] and presented a well-motivated idea [R1] that was novel [R2], simple and reproducible [R1]; ultimately resulting in good results [R1,R2,R3,R4]. Some shortcoming were also identified, including (1) unclear positioning and potential limited novelty with respect to [1] [R1,R2,R3] and (2) lack of sufficient comparisons to related work [R2,R3,R4]. Authors have provided a very through rebuttal that addressed all major concerns; providing compelling clarification of novelty (1) and additional experiments to address reviews comments for (2). As a result R2 and R3 raised their scores arriving at the final unanimously positive ratings for the paper of: 8, 7, 6, 6. AC has read the reviews, the rebuttal, resulting discussion and the paper itself.

consistent structural relation learning, neurips paper, zero-shot segmentation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback

Learning Mask-aware CLIP Representations for Zero-Shot Segmentation

Neural Information Processing SystemsJan-19-2025, 05:30:30 GMT

learning mask-aware clip representation, mask proposal, zero-shot segmentation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Koleilat, Taha, Asgariandehkordi, Hojat, Rivaz, Hassan, Xiao, Yiming

arXiv.org Artificial IntelligenceNov-17-2024

Segmentation of anatomical structures and pathological regions in medical images is essential for modern clinical diagnosis, disease research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing precise segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is still needed and highly relevant. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks from SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further. Extensive testing across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework. Our code is available at https://github.com/HealthX-Lab/MedCLIP-SAMv2.

large language model, machine learning, segmentation, (22 more...)

arXiv.org Artificial Intelligence

2409.19483

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Consistent Structural Relation Learning for Zero-Shot Segmentation

Neural Information Processing SystemsOct-10-2024, 13:20:53 GMT

consistent structural relation learning, unseen category, visual feature, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation

Koleilat, Taha, Asgariandehkordi, Hojat, Rivaz, Hassan, Xiao, Yiming

arXiv.org Artificial IntelligenceJun-19-2024

Medical image segmentation of anatomical structures and pathology is crucial in modern clinical diagnosis, disease study, and treatment planning. To date, great progress has been made in deep learning-based segmentation techniques, but most methods still lack data efficiency, generalizability, and interactability. Consequently, the development of new, precise segmentation methods that demand fewer labeled datasets is of utmost importance in medical image analysis. Recently, the emergence of foundation models, such as CLIP and Segment-Anything-Model (SAM), with comprehensive cross-domain representation opened the door for interactive and universal image segmentation. However, exploration of these models for data-efficient medical image segmentation is still limited, but is highly necessary. In this paper, we propose a novel framework, called MedCLIP-SAM that combines CLIP and SAM models to generate segmentation of clinical scans using text prompts in both zero-shot and weakly supervised settings. To achieve this, we employed a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss to fine-tune the BiomedCLIP model and the recent gScoreCAM to generate prompts to obtain segmentation masks from SAM in a zero-shot setting. Additionally, we explored the use of zero-shot segmentation labels in a weakly supervised paradigm to improve the segmentation quality further. By extensively testing three diverse segmentation tasks and medical image modalities (breast tumor ultrasound, brain tumor MRI, and lung X-ray), our proposed framework has demonstrated excellent accuracy. Code is available at https://github.com/HealthX-Lab/MedCLIP-SAM.

dataset, medclip-sam, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2403.20253

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

Jiang, Zekun, Cheng, Dongjie, Qin, Ziyuan, Gao, Jun, Lao, Qicheng, Li, Kang, Zhang, Le

arXiv.org Artificial IntelligenceFeb-24-2024

This study develops and evaluates a novel multimodal medical image zero-shot segmentation algorithm named Text-Visual-Prompt SAM (TV-SAM) without any manual annotations. TV-SAM incorporates and integrates large language model GPT-4, Vision Language Model GLIP, and Segment Anything Model (SAM), to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, thereby enhancing SAM for zero-shot segmentation. Comprehensive evaluations are implemented on seven public datasets encompassing eight imaging modalities to demonstrate that TV-SAM can effectively segment unseen targets across various modalities without additional training, significantly outperforming SAM AUTO and GSAM, closely matching the performance of SAM BBOX with gold standard bounding box prompts, and surpassing the state-of-the-art on specific datasets like ISIC and WBC. The study indicates that TV-SAM serves as an effective multimodal medical image zero-shot segmentation algorithm, highlighting the significant contribution of GPT-4 to zero-shot segmentation. By integrating foundational models such as GPT-4, GLIP, and SAM, it could enhance the capability to address complex problems in specialized domains. The code is available at: https://github.com/JZK00/TV-SAM.

dataset, segmentation, segmentation performance, (17 more...)

arXiv.org Artificial Intelligence

2402.15759

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback